logo

Introduction

Pitchers spend their professional life developing their “pitching arsenal” to become elite pitchers. We generally consider an “ace” to be such a pitcher, who has a deep arsenal of pitches with a wide variety of speed and moment. When one of these pitches is performing well this feeds into the effectiveness of the others, since at the end of the day pitches want to confuse batters. With the use of data analytics in recent years we have see a decrease in fastball usage and an increase in the use of secondary pitches in order to keep batters on there toes. This developed skill and the dedication to the craft of pitching should be rewarded, however pitcher may not only be confusing batter but also umpires.

Baseball, like most sports, is dependent on good officiating. The home-plate umpire in particular may have the most input on the outcome of a sporting game than any other official across all major sports. Most notably, the umpire is responsible for calling a strike or ball on all non-swing pitches. In this analysis we are interested in if umpires are biased towards specific pitch types when these pitches are on the boarder of the strike-zone. Thus, in the following sections we will explore the data and statistically model whether pitch-type impacts the probability of a strike being called if all other conditions are comparable.

Data Exploration

For this analysis we used data from the 2018 MLB season. The following variables are included in the dataset.

variables variable_descriptions
pitch_type The type of pitch thrown by the pitcher
release_speed The speed of the ball when it leaves the pitchers hand
pitcher A unique id that represents the pitcher throwing the pitch
balls The number of balls the batter has during the current pitch
strikes The number of strikes the batter has during the current pitch
plate_x The horizontal coordinate of the ball as it passes over the plate
plate_z The vertical coordinate of the ball as it passes over the plate
home_score Runs aka score of the away team at the time of pitch
away_score Runs aka score of the away team at the time of pitch
call The call made by the umpier for that pitch

Lets, begin our exploration of the data using our location-coordinate variables over the home-plate.

Here we can get a general idea of the strike zone. The strike zone in major league baseball is defined as the following:

Therefor the strike zone is a function of the batter and thus we cannot conclusively categorize all pitches as a strike or a ball given we do not posses information of the batter on each throw.

Defining Boarderline Pitches

Since we are interested in if a pitch-type may affect the call on “edge-cases”, we need to subset the data such that it only includes pitches close to the edge of the strike zone or areas that umpires have a difficult time calling. This is necessary as the spatial variables will dominate any statistical model of the call, and thus any impact of the other variables will be overshadowed.

A quick note on missing data

  • This data set contains 121 observations/rows which contain missing values. Some of these observations are missing the pitch-type or the release-speed, but all of these observations are missing the (x,z) coordinates of the pitch as it crosses the plate. In order for us to define “edge-case” pitches, these features are needed, thus imputation would be a reasonable application here. Given that the initial data set contains 20,528 observations of pitches, we opt to simply remove these rows.

Two general strategies can be used to define borderline pitches:

  • A naive approach can be taken where we eye-ball the area of interest and we cut out an outer rectangle and an inner rectangle and only keep points between the two.
  • Another approach would be to model the call solely based on the location variables in some fashion and identify observations that it cannot classify with a large amount of certainty.

We opt for the second option for this task. Specifically, a K-nearest neighbor (KNN) classifier is used. For each observation, KNN identifies K points that are closest to it, then estimates the conditional probability of that point belonging a class based on the fraction of those K points that belong to that class. The observation is then classified to the class for which it has the largest probability. For this analysis we chose K equal to 15. Filtering out observations whose class probability is 100% or 0% for each category by KNN, we are left with the following data points.

Now we can make out where the strike zone is, but more importantly, the zone where the umpires have a difficulty time calling. Before we move on to modeling this data, lets see what data that we are left with after sub-setting the data.

Size of Data
Number of pitches 7009
Number of variables recroded 10
Call Number of calls made
ball 3526
strike 3483

On the left we see the dimensions (# of pitches and features recorded) of the data after sub-setting it. On the right we see the number of strikes and balls called left in the data.

\(\space\)

Pitch-type: Through this analysis each individual pitch is refereed to by an abbreviation, a general description of these pitch types is provided in the appendix of this analysis. Here we profide a general overview of the pitch-type variable in this data.

Statistical Methedology

One again the primary goal of this analysis is to assess whether or not the pitch-type impacts the probability of a strike being called if all other conditions are comparable. To this end, we would like to develop a strong statistical model for the umpires call using the other features of the data set. For this purpose, multiple statistical models were developed, trained, and tested in an comprehensive search for the optimal model to give predictions/estimates. Three statistical frameworks are utilized, Logistic Regression, Random-Forests, and Bagging.

A note on Logistic-Regression and the “pitcher” variable:

Tuning Models

In order to to produce a competitive models hyper-parameters, parameters which are not determined by the model/data alone, must be tuned optimally. The tuning procedure for the random forest model is detailed here:

  • Random Forest Tuning parameters:
    • mtry: the is the number of predictors randomly selected to consider at each split
    • minsplit: the minimum number of observations that must exist in a node in order for a split to be attempted
    • Cp: the complexity parameter. Any split that does not decrease the overall lack of fit by a factor of cp is not attempted
  • 150 combinations of the above three parameters were tested via 5-fold cross validation which was repeated 10 times to account for variations in splits

Note that sample-size and the number of trees were not tuned for the sake of saving computational intensity, future work may include tuning these parameters. The same method was used to tune the bagging model, however only using the parameters minsplit and Cp.

Cross-Validation Between Models

For the selection of the final model, 5-fold Cross-Validation was performed with 10 repetitions where log-likelihood was used as an objective measure of fit. Here we show the Log-likelihood computed for each repetition: We can see the the Random Forest performs the best out of the three models with the highest computed log-likelihood, with minor differences across repetitions. Note that the Logistic-Regressions model seemingly performs better than the Bagging model, however a closer look indicated that almost all class probabilities are computed to be about 50%. As a result, we move forward with the Random Forest model.

Measuring Impact of Pitch-Type

In order to assess whether or not the pitch type impacts the probability of a strike being called we resort to simulation based methods, specifically bootstrapping. Bootstrap-simulation is used as the chosen model is semi-parametric in nature, opposed to a parametric model such as logistic regression where coefficients can be interpreted. For each pitch type, the dataset is transformed such that all the pitches in the zone of interest have that pitch-type and are thrown at their average speed from the entire data set. This is done so that we may determine on average if a pitch-type in the questionable zone affects the call. Note that for different sub-domains of locations in the zone of interest our variables may have different distributions, future work may contain further explorations of these sub-domains. 1,000 model based bootstrap simulations are done for the original truncated dataset (the dataset that was obtained from the “Defining Borderline pitches” section) and used to predicted each of the pitch-type datasets.

Results

Below we can see the results from the bootstrapped simulation using our best tuned model:

On average umpires seem to call most of the pitches in the borderline-zone roughly equally. However, we do see a large disparity in our point estimate of the predicted probability between a curve-ball and a two-seam fastball. A two-seam fastball has about a 12% increase in probability of getting a strike call compared to a curve-ball on average in the borderline-zone. Based on the bootstrapped confidence intervals, we can also be 95% confident that these pitch-types have differing probabilities of being called strikes on average over the questionable-zone. It is interesting to note, a curveball is a breaking pitch that has more movement than just about any other pitch, compared to the two-seam fastball which is generally one of a pitcher’s fastest pitches. More moment seems not only be tricking batters but also umpires on average.

Comments and Future Work

In this analysis we have scraped the surface of the impact that a pitcher pitch-type has on the umpires call. A few interesting questions have arise that maybe worth dedicating further analysis and exploration.

  • We looked at borderline pitches around the strike-zone in general, identifying if different sub-domains of this zone contain different results could be interesting.
  • The two pitches that have the largest difference in call probability are a high moment pitch and a low moment fast pitch. It might be interesting to recategorize pitches as: high speed with low movement, high speed with movement, low speed with high movement and low speed with low movement.
  • It may also be of interest to see if the duration of movement has an affect on the call, as some pitched move over the full duration from the mound to the home plate, while others break hard just before the plate.
  • Pitch-types are some what arbitrary as they describe a pitches movement and speed in general, along with the grip of the pitch. However, some pitchers hold balls slightly different and some throw pitches that are seemingly between the description of two pitches. It might be interesting to gather movement data along with speed and redo this analysis without our pitch-type classifications.
  • Lastly, in this analysis we only analysed physical limitations of umpires, it would be interesting to see if any human-emotional factors affect there call.

Appendex

Pitch-type descriptions:
pitches pitch_descriptions
CH Changeup: A changeup is one of the slowest pitches thrown in baseball, and it is predicated on deception. A good changeup will cause a hitter to start his swing well before the pitch arrives, resulting in either a swing and miss or very weak contact. But when a hitter is able to identify the changeup, the pitch is among the easiest to hit because of its low velocity.
CU Curveball: A curveball is a breaking pitch that has more movement than just about any other pitch. It is thrown slower and with more overall break than a slider, and it is used to keep hitters off-balance. When executed correctly by a pitcher, a batter expecting a fastball will swing too early and over the top of the curveball.
EP Eephurs: The eephus is one of the rarest pitches thrown in baseball, and it is known for its exceptionally low speed and ability to catch a hitter off guard.
FC Cutter: A cutter is a version of the fastball, designed to move slightly away from the pitcher’s arm-side as it reaches home plate. Cutters are not thrown by a large portion of Major League pitchers, but for some of the pitchers who possess a cutter, it is one of their primary pitches.
FF Four-Seam Fastball: A four-seam fastball is almost always the fastest and straightest pitch a pitcher throws. It is also generally the most frequently utilized.
FO Forkball: One of the rarest pitches in baseball, the forkball is known for its severe downward break as it approaches the plate. Because of the torque involved with snapping off a forkball, it can be one of the more taxing pitches to throw.
FS Splitter: A pitcher throws a splitter by gripping the ball with his two fingers split on opposite sides of the ball. When thrown with the effort of a fastball, the splitter will drop sharply as it nears home plate.
FT Two-Seam Fastball: A two-seam fastball is generally one of a pitcher’s fastest pitches, although it doesn’t have quite the same velocity as a four-seam fastball. A two-seam fastball is one of the most frequently thrown pitches in baseball.
KC Knuckle-curve: The knuckle-curve is one of baseball’s greatest paradoxes, given that a curveball is defined by its spin and a knuckleball is defined by its lack thereof. Still, the knuckle-curve produces the desired effect of the two pitches – a slow, curveball break mixed with the unpredictable fluttering of the knuckleball.
SI Sinker: The sinker is a pitch with hard downward movement, known for inducing ground balls. It’s generally one of the faster pitches thrown and, when effective, induces some of the weakest contact off the bats of opposing hitters.
SL Slider: A slider is a breaking pitch that is thrown faster and generally with less overall movement than a curveball. It breaks sharply and at a greater velocity than most other breaking pitches. The slider and the curveball are sometimes confused because they generally have the same purpose – to deceive the hitter with spin and movement away from a pitcher’s arm-side. When a pitch seems to toe the line between the two, it is referred to in slang as a slurve.

Code Available Upon Request